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4.  Comments 

This  report  deals  with  our  work  towards  the  Task  2  deliverable  of  this  contract. 
The  objective  of  this  task  is  to  "codify  appropriate  representation  for  selected  narrative 
texts". 

4.1.  Current  Work 

Work  was  begun  on  the  design  of  a  new  representation  of  for  the  narrative  texts 
from  CASREPS.  A  report  was  generated  that  describes  the  problems  with  the  old 
representation  and  also  vaguely  describes  a  new  representation.  The  representation  is 
not  yet  complete  and  work  will  continue  into  the  next  quarter  in  developing  it  and  then 
encoding  it  on  the  Symbolics,  probably  using  KEE.  The  report  submitted  to  the  COTR 
is  attached  to  this  quarterly  report. 
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1.  Knowledge  Representation  of  Natural  Language  Text 

'The  purpose  of  the  TERSE  project  has  been  to  develop  a  natural  language  system 
that  can  perform  the  function  of  highlighting  important  information  in  Navy  CASREPs. 
This  task  involves  research  in  many  different  areas,  all  of  whkk-ar^  dependent  on  one 
another.  The  most  central  issue  in  designing  this  system  is  knowledge  representation. 
This  issue  is  central  because  the  design  of  other  portions  of  the  system  will  be  effected  by 
the  choice  of  knowledge  representation.  For  example,  the  system  which  analyzes  the  raw 
natural  language  text  must  be  designed  in  such  a  way  that  it  can  transform  its  input  into 
this  knowledge  representation.  The  application  component  is  also  tightly  coupled  with 
the  representation  because  it  looks  for  information  contained  in  it  and  make  inferences 
from  that  information. 

A  knowledge  representation  can  have  a  great  effect  on  the  ease  with  which  the  sys¬ 
tem  can  be  defined  and  maintained.  A  knowledge  representation  is  not  just  the  set  of 
datastructures  designed  to  hold  knolwedge  for  an  application^  Barr  and  Feigenbaum 
definfe^a  knowledge  representation  "a  combination  of  data  structures  and  inter¬ 

pretive  procedures  that,  if  used  in  the  right  way  in  a  program,  will  lead  to  knowledge¬ 
able  behavior  .  It  is  important  that  the  representation  exhibit  the  ability  to  capture  this 
heterogeneous  information  in  such  a  way  that  makes  it  easy  to  handle  and  understand 
for  the  human  designers  because  in  large  representation  systems,  things  can  quickly  get 
unmanageable. 

It  is  not  our  goal  to  represent  everything  that  a  human  expert  who  would  read  these 
messages  might  understand.  It  has  been  our  effort  to  only  capture  the  meaning  of  the 
text  at  a  level  tjiat  facilitates  our  particular  application,  "v.-"  '  '  ' 
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1.1.  Level  of  Representation  Being  Targeted 

Natural  language  text  can  be  represented  at  several  different  levels.  There  is  the 
discourse  level  which  reflects  how  each  assertion  contained  in  the  discourse  is  related  to 
another.  The  discourse  level  can  be  divided  into  several  different  levels  each  of  which 
model  different  ways  that  assertions  can  be  connected.  Examples  of  different  types  of 
discourse  relations  are  :  temporal  sequencing,  thematic  progression2,  and  contextual 
implicature3.  These  discourse  relations  are  not  characteristic  to  individual  sentences. 
They  are  usually  intersentential  by  nature,  although  some  mechanisms  for  providing  this 
discourse  level  information  can  be  found  at  the  sentential  level  as  well.  Mechanisms  capa¬ 
ble  of  interpreting  this  level  of  discourse  are  necessary  in  any  system  capable  of  "under¬ 
standing"  natural  language. 

A  more  primitive  level  of  natural  language  structure  is  the  assertional  level.  At  this 
level,  actors  and  objects  are  related  to  some  event,  action,  or  state.  Assertions  can  be 
found  in  nominalized  verbs,  adjectives,  entire  sentences  or  clauses.  Assertions  have  the 
property  of  predicating  something  about  the  actors  and  objects  related  by  the  assertion. 
We  will  call  the  element  which  is  indicative  of  an  event,  state,  or  action,  the  predicate. 
The  actors  and  objects  that  are  associated  with  the  predicate  are  called  its  arguments. 
This  paper  focuses  on  the  level  of  discourse  at  which  assertions  are  made. 
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1.2.  Knowledge  Required  for  the  Assertions!  Level 

In  the  representation  of  predicates,  it  necessary  to  enumerate  all  the  arguments  that 
are  central  to  the  semantic  definition  of  that  predicate.  It  is  also  necessary  to  include 
information  that  tells  us  from  what  field  of  objects  in  the  domain  (from  what  semantic 
class)  each  argument  can  come.  An  equally  important  type  of  information  is  the  seman¬ 
tic  role  information.  This  information  is  the  equivalent  to  the  "procedure"  element  of 
the  equation  "knowledge  representation  =  structure  +  procedure".  The  procedures  mak¬ 
ing  up  the  semantic  role  information  are  what  enable  the  application  portion  a  system  to 
"know"  what  it  knows  about  the  "meaning"  of  that  particular  predicate. 

There  are  some  general  purpose  semantic  roles  that  are  considered  universally 
interesting.  For  example,  roles  dealing  with  such  concepts  as  intentionality,  causality, 
and  instrumentality  would  be  useful  to  almost  any  application  program.  It  is  interesting 
to  have  a  concept  of  event-initiators,  entities  changed  because  of  the  event,  entities  used 
to  effect  the  event  and  other  concepts  needed  for  the  particular  application.  These  con¬ 
cepts  represent  the  rudiments  of  understanding.  More  detailed  concepts  can  be  incor¬ 
porated  in  an  appropriate  manner  as  dictated  by  the  requirements  of  the  application. 


2.  The  CASREP  Implementation 

In  the  Summary  project,  we  have  adopted  an  approach  called  Information  For¬ 
matting.  The  initial  concept  that  was  conceived  by  Zellig  Harris  during  the  late  50’s4. 
He  developed  a  method  called  distributional  analysis  for  deriving  equivalence  classes 
from  natural  language  discourses.  The  classes  that  were  derived  were  referred  to  as 
semantic  categories.  Harris  preformed  this  work  in  an  effort  to  seek  "...  some  global 
structure  characterizing  the  whole  discourse,  or  large  sections  of  it".  Some  of  the  struc¬ 
ture  that  he  was  seeking  was  found  in  patterns  of  these  semantic  categories.  These  pat¬ 
terns  were  found  to  vary  from  one  subset  of  language  to  another.  These  subsets  of 
language  were  named  sublanguages.  Harris  believed  that  through  characterizing  the 
structure  of  a  sublanguage,  some  degree  of  semantics  from  that  sublanguage  would  be 
discovered. 

Theoretically,  distributional  analysis  does  not  make  use  of  any  domain  specific 
knowledge  to  discover  these  classes  of  words.  The  constituents  are  classified  entirely  on 
their  syntactic  characteristics  and  their  surrounding  environment.  This  approach  gave 
Harris  a  methodical  way  of  describing  characteristics  of  narrow  domains  of  discourse  that 
had  never  been  described  (at  least  in  this  manner)  before. 

During  the  70’s,  Harris’  work  was  expanded  on  by  Naomi  Sager5  at  NYU.  She  con¬ 
centrated  much  of  her  effort  on  discovering  and  employing  these  "textual  regularities" 
that  are  characteristic  of  sublanguages.  She  developed  a  process  called  information 
formatting  in  which  texts  were  mapped  from  their  raw  form  into  tables.  Each  of  the 
semantic  categories  developed  for  the  particular  sublanguage  is  a  column  in  the  table. 
Each  line  of  the  table  is  called  a  single  format,  except  for  special  lines  that  perform  the 
function  of  connecting  two  lines  of  the  table.  These  lines  are  called  connectives.  They 
introduce  a  tree-like  structure  that  takes  only  two  arguments.  The  arguments  can  either 
be  a  single  line  format  or  another  connective.  One  of  the  most  important  aspects  of  this 
work  is  that  the  process  of  mapping  natural  language  text  into  these  formatted  tables 
could  be  performed  by  a  computer  program. 

2.1.  Characteristics  of  the  Message  Domain 

The  CASREP  message  domain  constitutes  a  complicated  sublanguage.  CASREPs 
range  in  complexity  from  simple  to  complex,  and  from  almost  entirely  grammatically 
correct  to  ill-formed. 

Test  A  1 

Starting  air  regulating  valve  failed.  Unable  to  consistently 
start  NR  IB  gas  turbine.  Valve  parts  excessively  corroded. 


TestB  7 


SAC  had  local  monitoring  capability  for  lube  oil  pressure  only, 
due  to  the  recent  failure  of  the  sac  lube  oil  pressure  transducer. 

Prior  to  engagement  it  was  reported  that  SAC  LO  pressure  dropped 
to  zero.  No  metallic  particles  in  LO  filters.  Borescope  investiga¬ 
tion  revealed  a  broken  tooth  on  the  hub  ring  gear.  It  is  likely 
the  LO  pump  has  sheared.  The  LO  pressure  and  alarm  capability  is 
a  necessity  for  operation. 

SITREP  002:  Drive  shaft  for  SAC  was  manufactured  locally.  S/F  re¬ 
installed  old  SAC  utilizing  new  drive  shaft.  On  testing  of  SAC  lube 
oil  pressure  could  not  be  adjusted  above  85  PSIG.  Replacement  SAC 
will  be  required.  The  original  drive  shaft,  when  installed,  was 
packed  utilizing  60  grams  of  grease,  when  removed,  on  failure  of 
SAC,  the  drive  shaft  was  dry  and  showed  signs  of  extensive  heat 
stress. 

CASREPs  can  contain  a  great  amount  of  disciission  concerning  the  actions  of  the  partici¬ 
pants  in  discovering  and  attempting  to  correct  the  equipment  failure.  The  previous  two 
messages  are  from  the  SAC  CASREP  message  set.  It  is  apparent  from  the  second  mes¬ 
sage  that  this  "sublanguage"  is  not  that  constrained. 

2.2.  The  CASREP  Information  Formats 

An  initial  set  of  semantic  categories  and  semantic  patterns  was  developed  in  the 
early  stages  of  this  project.  After  some  work  was  begun  on  the  application  end  of  the 
Summary  system,  it  was  determined  that  the  representation  was  lacking  in  its  ability  to 
accurately  and  consistently  represent  information  contained  in  the  texts.  This  problem 
was  addressed  in  several  different  ways.  First,  several  new  semantic  categories  were 
added.  The  display  below  shows  the  original  categories  and  the  categories  which  were 
added  are  shown  below. 


semantic  categories 

example  members 

ADMIN 

forward,  report,  expedite 

FUNC 

rotate,  operate,  engagement 

INVEST 

check,  isolate,  troubleshooting 

MSG 

CASREP,  message 

ORG 

ship,  CCS,  technician 

PART 

sac,  shaft,  gear 

PROCURE 

deliver,  purchase,  reorder 

PROCESS 

manufacture 

PROP 

clearance,  pressure,  capability 

REPAIR 

repair,  adjust,  overhaul 

STASK 

arrival,-  assignment,  transit 

STATUS 

fail,  malfunction,  loss 

new  semantic  categories 

example  members 

ALARM 

alarm 

AREA 

interned,  tip 

PARTLOC 

come 

PIECE 

particle,  chunk,  material 

REQUIRE 

require 

SUBSTANCE 

metal 

In  the  revision,  connectives  were  broken  down  into  several  different  subclasses  of 
connectives.  In  the  original  implementation  all  connectives  were  classified  under  a  single 
heading  of  CONN.  The  following  display  shows  the  new  subheadings  of  CONN. 

new  connectives 

CONJOINED,  TEvE-CONJ,  RELATION,  SUB-CONJ, 

EMBEDDED,  RED  CLAUSE. 

In  the  original  implementation,  a  semantic  category  could  be  modified  by  several 
different  types  of  modifiers  as  shown  below. 

old  category  modifiers 

TIME,  LOC,  QUANT,  EFFORT(man-hours),  NEG,  MODAL 

In  the  revision,  the  EFFORT  modifier  was  dropped  as  there  were  no  occurrences  of  its 
use  in  the  equipment  failure  messages.  The  TIME  modifier  was  removed  from  the 
category  level  and  a  block  of  TIME  modifiers  was  added  to  modify  a  format  as  a  whole. 
This  is  more  appropriate  since  formats  should  be  assertional  elements  and  time  informa¬ 
tion  is  more  appropriately  attached  to  an  assertion  than  to  a  single  word.  The  following 
display  shows  all  the  fields  of  the  TIME  block  record  which  modifies  an  entire  format. 

TIME  modifier  block 
TIME 

verbtense 

change 

beg 

end 

EVENT-TIME 

TIME-PREP  1 

prep 

time- unit 
reference- point 
TIME-PREP2 
prep 

time- unit 
reference- point 


2.3.  The  CAS  REP  Semantic  Patterns 

Eight  semantic  patterns  were  initially  developed  for  a  set  of  CASREP  messages. 
Six  of  these  patterns  are  shown  below.  The  two  that  are  not  shown  were  specific  to  the 
electronic  equipment  messages  and  not  applicable  to  the  starting  air  compressor  mes- 


Pattern  Name 

Pattern 

General  Administration 

ORG  PART  /ADMIN  PROCURER 

Repair 

ORG  PART  STATUS  REPAIR 

Investigation 

ORG  PART  STATUS  INVEST 

Part  State 

PART  FUNC  PROP  STATUS 

Assistance 

ORG  ASSIST 

Ship  Task 

ORG STASK 

Pattern  number  one  contains  a  parenthesized  element  that  indicates  that  either  an 
ADMIN  word  or  a  PROCURE  word  co-occurs  with  the  other  pattern  elements.  Both 
will  not  be  present  at  once.  This  suggests  that  there  are  two  different  patterns:  one  in 
which  PROCURE  occurs  and  one  in  which  ADMEN  occurs.  The  entry  that  contains  two 
parenthesized  elements  indicates  that  the  pattern  comes  in  two  flavors  where  one  of  the 
two  elements  is  present.  The  bold-print  items  are  required  to  identify  the  pattern.  It  is 
generally  the  case  that  all  the  items  in  a  single  pattern  can  be  present  at  one  time  in  an 
assertion  of  that  type. 
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3.  Problems  with  the  Current  Approach 

3.1.  Missing  Semantic  Patterns  and  Unformatted  Information 

In  the  design  of  information  formats  for  the  CASREP  messages,  there  has  been  a 
digression  from  the  idea  that  there  exist  patterns  for  every  type  of  information  in  the 
sublanguage.  Our  CASREP  patterns  cover  only  the  portion  of  patterns  which  represent 
simple  assertions  where  the  assertion  does  not  contain  more  than  one  item  of  the  same 
semantic  category.  All  other  patterns  which  represent  assertions  with  more  than  one 
item  classified  under  a  single  semantic  category  and  assertions  that  take  other  assertions 
as  arguments,  have  not  been  described  and  are  missing  from  our  master  list  of  patterns 
for  this  sublanguage.  Consider  the  following  example  from  the  CASREP  domain: 

Drive  shaft  sheared,  all  internal  gear  teeth  from  drive  adapter  hub 


LINE# 

CONNECTIVE 

PART 

AREA 

RELATION 

1 

shear 

4 

drive  shaft 

3 

gear  teeth 

internal 

No  pattern  developed  in  this  implementation  can  accommodate  two  PART  entries. 
In  the  CASREP  implementation,  the  verb  shear  was  treated  as  a  connective  and  its  two 
PART  arguments  were  put  into  separate  formats.  This  treatment  of  the  problem  intro¬ 
duced  another  problem  because  it  yielded  formats  (for  each  of  the  arguments)  that  were 
non-assertional. 

Actually,  there  are  no  patterns  to  handle  any  connectives.  Therefore,  any  time  we 
have  been  forced  to  use  a  connective  (as  in  the  previous  and  following  examples),  there 
has  been  no  pattern  to  describe  that  information.  The  following  example  shows  how 
predicates  that  take  other  predicates  as  arguments  were  handled.  Basically,  the  high- 
level  predicate  was  treated  as  a  connective.  The  arguments  to  that  connective  were  then 
mapped  into  separate  formats.  Because  the  arguments  were  other  predicates  and  thus 
assertional  themselves,  no  problem  of  non-assertional  formats  was  created  in  handling 
this  particular  situation  in  this  way,  but  still  there  was  no  pattern  to  describe  this 
occurrence  in  the  sublanguage. 

Reduced  capability  of  NR4  SAC  restricts  ships  operation. 


LINE#  CONNECTIVE  I  FUNC  I  PROP  1  STATUS  I  PART 


RELATION 


From  these  examples,  other  implementation-related  problems  can  be  seen.  In  the 
first  example,  the  prepositional  phrase  from  drive  adapter  hub  was  not  formatted  at  all. 
In  the  second  example,  the  word  ships  was  not  formated.  A  significant  amount  of  infor¬ 
mation  from  the  original  text  may  not  be  formatted.  This  is  especially  true  of  preposi¬ 
tional  phrases.  While  it  is  important  for  a  knowledge  representation  to  surpress 
irrelevant  detail,  the  information  which  was  omitted  was  often  important  information 
used  in  understanding  what  was  written  in  the  message. 
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3.2.  Inherent  Problems  with  the  Current  Approach 

Although  several  of  the  problems  mentioned  above  are  related  to  oversights  in  the 
implementation  of  the  CASREP  information  formats,  the  most  central  problem  is  related 
to  a  particular  type  of  information  that  is  not  provided  through  distributional  analysis. 
However,  recognizing  the  need  for  this  missing  information  requires  a  shift  in  the 
theoretical  perspective  imposed  in  the  LSP  system.  In  our  discussion  of  this  problem,  the 
reader  will  recognize  the  markings  of  a  theory  of  linguistics  called  case  frame  theory. 
The  theory  of  linguistics  employed  by  the  LSP  system  (immediate  constituent 
analysis)  and  case  frame  theory  are  rather  diverse  theories.  Immediate  constituent 
theory  analyzes  natural  language  in  terms  of  consituency  relationships.  Case  frame 
theory  looks  at  natural  language  constituents  in  terms  of  their  functionality.  It  is  this 
concept  of  functionality  that  has  not  been  included  in  the  current  CASREP  implementa¬ 
tion. 

3.3.  What  Does  Distributional  Analysis  Give  You? 

To  perform  distributional  analysis  on  a  sample  discourse,  a  set  of  transformations 
must  be  performed  where  sentences  and  nominals  are  normalized  into  Subjech-Verb- 
Object  forms.  These  transformations  and  others  are  described  in  Harris4.  Once  the  sam¬ 
ple  is  completely  normalized,  a  procedure  that  is  also  outlined  in  Harris’  paper  is  applied 
where  equivalence  classes  of  constituents  are  generated. 

The  semantic  classes  Jiat  are  generated  may  not  be  as  semantically  appealing  as 
one  might  hope.  It  is  possible  to  find  classes  of  nouns,  for  example,  that  seem  to  have  in 
common  only  their  syntactic  position  in  the  test  data.  One  phenomenon  that  can 
account  for  this  is  the  transformation  of  different  deep  level  cases  into  the  same  syntactic 
position.  For  example,  in  each  of  the  following  sentences  with  the  same  verb,  the  surface 
syntactic  subject  is  a  different  deep  structure  argument  to  the  verb. 

Mary  cut  the  cake. 

The  knife  cut  the  cake. 

The  cake  cut  easily. 


This  is  potentially  a  problem  with  using  the  results  of  distributional  analysis  "as  is". 
One  must  be  careful  that  an  occurrence  of  this  type  does  not  cause  confusion  in  the  gen¬ 
erated  noun  classes.  However,  if  enough  different  uses  of  the  nouns  involved  in  this  kind 
of  ambiguity  can  be  found  in  the  data,  there  may  exist  justification  for  separating  them 
from  the  same  class.  It  is  also  not  likely  that  this  kind  of  variability  in  subject-fronting 
will  be  present  in  a  narrow  sublanguage  although  it  does  occur  to  a  small  degree  in  the 
CASREPs. 

Sac  engaged. 

We  engaged  the  sac. 


example ' 

This  potential  problem  arises  because  the  normalized  form  from  which  all  the 
analysis  is  performed  is  not  equivalent  to  a  semantic  normal  form.  The  set  of 


transformations  described  by  Harris  do  not  take  into  account  these  deep  structure  argu¬ 
ments  and  therefore  the  "normalized"  SVO  structures  can  still  reflect  variability  in  syn¬ 
tactic  structure.[l] 

Semantic  patterns  are  groups  of  semantic  categories  that  occur  in  some  regularity  in 
a  sublanguage.  Because  of  the  above-mentioned  problem  with  semantic  categories,  care 
must  be  taken  when  looking  at  the  "semantic  patterns"  which  arise  in  a  sublanguage. 
That  is,  when  one  decides  that  certain  sample  sentences  exhibit  the  same  pattern,  the 
possibility  of  differing  semantic  roles  in  the  same  syntactic  position  must  be  considered. 

3.4.  What  Distributional  Analysis  Does  Not  Give  You 

Semantic  patterns  found  through  the  process  of  distributional  analysis,  do  not 
include  the  semantic  role  information  that  describes  the  function  of  its  basic  elements. 
The  best  that  can  be  hoped  for,  is  that  the  analysis  can  provide  you  with  skeletal  struc¬ 
tures  that  have  unnamed  slots  but  have  semantic  restrictions  on  what  can  be  filled  into 
those  slots.  The  semantic  restrictions  on  the  slots  are  provided  by  the  semantic  class 
information.  The  number  of  slots  that  are  contained  in  the  semantic  pattern  corresponds 
to  the  number  of  arguments  that  are  found  to  occur  with  the  predicate  in  the  sub¬ 
language.  Therefore,  distributional  analysis  yielding  semantic  patterns  gives  us  the  term 
"structure"  in  the  equation  "knowledge  representation  =  structure  -f  procedures".  The 
semantic  role  part  of  the  equation  represented  by  the  "procedure"  term  is  not  provided 
by  this  analysis. 


[l]  Harris  does  mention  that  cases  will  arise  where  the  classes  that  are  justified  distributional- 
ly,  will  not  jive  semantically  and  that  the  person  performing  the  distributional  analysis  can  break 
a  class  up  to  give  it  a  more  palatable  semantic  flavor. 


4.  An  Enhanced  Representation 

In  the  following  sections,  an  enhanced  representation  will  be  discussed.  The 
enhancements  being  proposed  would  provide:  a  more  complete  specification  of  the  pat¬ 
terns  in  the  sublanguage,  a  consistent  way  of  handling  nested  assertional  arguments,  an 
explicit  identification  of  predicating  elements,  and  the  possibility  of  being  able  to  com¬ 
bine  some  semantic  and  syntactic  information  in  a  uniform  way. 

4.1.  A  Predicate  Oriented  Representation 

In  the  enhanced  representation  we  will  need  to  represent  the  predicates  as  a  unified 
units  that  contains  the  following  information  or  is  easily  integrated  with  other  knowledge 
sources  containing  the  following  information.  In  these  units  we  need  1)  semantic  type 
information  for  predicate  arguments  2)  semantic  role  information  to  describe  the  rela¬ 
tionship  of  the  argument  to  the  predicate  and  3)  a  way  to  identify  the  actual  predicate 
from  its  arguments.  It  should  be  noted  that  if  we  accomplished  only  number  1,  we 
would  essentially  have  semantic  patterns.  But  as  pointed  out  earlier,  semantic  role  infor¬ 
mation  is  essential  and  making  the  predicate  central  and  distinguished  is  an  important 
point  because  it  makes  other  aspects  of  the  understanding  process  easier  to  deal  with.  We 
envision  the  implementation  of  this  knowledge  representation  in  frame-like6  units  with 
slots  and  even  possibly  with  procedures  attached  to  the  units  and/or  the  slots.  The  typi¬ 
cal  inheritance  behavior  of  a  frame  system  is  also  assumed. 

4.1.1.  Semantic  Role  Information  Revisited 

In  our  discussion  on  Information  Formatting  and  Distributional  Analysis,  it  was 
shown  that  this  semantic  role  information  is  what  was  lacking  in  our  representation.  We 
stated  that  this  information  is  critical  to  a  knowledge  representation  and,  distributional 
analysis  did  not  provide  that  information.  As  far  as  we  know,  there  exists  no  automatr 
able  procedure  to  derive  this  semantic  role  information.  However,  we  can  use  distribu¬ 
tional  analysis  to  help  isolate  the  arguments  to  a  predicate.  But  from  that  point  on,  we 
must  resort  to  using  our  own  knowledge  even  though  we  do  not  understand  that 
knowledge  about  the  meaning  of  the  predicate  and  also  our  knowledge  about  the  require¬ 
ments  of  the  application. 

To  obtain  this  semantic  role  information,  we  take  a  look  at  the  arguments  that  dis¬ 
tributional  analysis  has  highlighted.  With  the  ultimate  application  in  mind,  we  decide 
what  knowledge  sources  would  make  use  of  that  particular  argument.  At  this  point,  we 
have  roughly  assigned  an  interpretation  to  that  argument.  We  assign  some  name  that  is 
descriptive  of  that  relationship  to  the  slot.  This  is  just  a  tool  to  help  us  remember  what 
interpretation  we  wanted  to  use.  It  is  the  implementation  of  the  knowledge  sources 
which  interpret  this  particular  argument  that  really  embody  the  nature  of  the  role  we 
have  assigned  to  that  argument. 

4.1.2.  Primitive  Predicates 

After  going  through  the  process  of  determining  the  role  for  each  predicate  argument 
for  every  predicate  in  the  domain,  it  is  possible  that  some  predicates  will  appear  to  be 
synonymous  to  other  predicates.  This  synonymy  may  be  exhibited  be  a)  two  or  more 
predicates  having  the  same  number  of  arguments  and  b)  the  knowledge  sources  which 
define  the  relationships  of  the  arguments  to  the  predicates  being  identical.  It  may  not  be 


the  case  that  the  predicates  forming  such  a  set  would  be  considered  synonymous  in  all 
other  situations  or  applications,  but  since  all  knowledge  sources  were  created  with  respect 
to  the  goals  of  a  particular  the  application,  it  is  possible  that  for  some  applications  it  is 
not  necessary  to  differentiate  the  nuances  in  meaning  between  some  predicates. 

These  synonymous  predicates  form  sets  of  predicates  that  can  be  described  by  a  sin¬ 
gle  primitive  predicate.  Within  that  set,  we  do  not  need  to  differentiate  meaning.  There¬ 
fore,  in  our  representation,  a  frame  typically  represents  a  predicate  described  at  an 
application-specific  primitive  level  and  it  is  likely  that  more  than  one  predicate  in  the 
domain  will  map  into  that  frame  thus  being  considered  essentially  synonymous. 

4.1.3.  The  Relation  of  One  Predicate  Primitive  to  Another 

Once  we  have  decided  what  predicate  primitives  exist  in  our  subdomain,  other  rela¬ 
tionships  (besides  synonymy)  between  predicates  can  be  established.  For  instance,  there 
could  be  an  hierarchy  of  the  predicates  where  more  specific  concepts  are  found  under 
more  general  concepts.  We  could  thus  exploit  the  inheritance  mechanism  in  a  frame  sys¬ 
tem  to  implement  this  relationship  and  others  in  an  efficient  way.  The  ways  that  predi¬ 
cates  should  be  related  to  one  another  depends  on  the  needs  dictated  by  the  application. 

4.1.4.  Improvements  Over  CASREP  Information  Formats 

The  slots  of  the  frames  will  represent  the  arguments  to  the  predicate.  Because  our 
slots  are  defined  in  terms  of  a  relationship  to  the  predicate  (semantic  role),  there  will  be 
no  difficulties  with  predicates  that  take  two  or  more  arguments  of  the  same  semantic 
class  as  was  a  problem  with  CASREP  patterns.  In  this  situation,  each  argument 
although  possibly  of  the  same  semantic  class,  is  serving  in  a  different  semantic  role  with 
respect  to  the  predicate.  Thus,  there  will  be  separate  slots  to  accommodate  each  of  the 
arguments  with  no  problems  created  by  the  fact  that  two  different  slots  can  be  filled  with 
arguments  into  different  semantic  classes.  The  semantic  class  information  would  be  stored 
as  a  type  restriction  on  the  slot,  not  as  a  slot  itself  as  was  the  case  with  CASREP  pat¬ 
terns. 

This  representation  will  not  differentiate  between  predicates  taking  simple  elements 
as  arguments  and  predicates  taking  other  predicates  as  arguments.  Each  argument  to  a 
predicate,  regardless  of  its  complexity,  will  be  filled  into  a  unique  slot.  This  does  not 
cause  any  concern  because  it  is  the  role  relationship  that  is  being  stressed  in  this  predi¬ 
cate  oriented  representation,  not  the  logical  complexity  of  the  actual  argument.  This 
eliminates  the  problems  caused  by  representing  some  predicates  as  formats  and  others  as 
connectives  in  the  CASREP  implementation. 

By  including  the  appropriate  knowledge  in  our  representation  and  by  choosing  a 
representation  which  makes  the  inclusion  of  that  knowledge  straight-forward,  we  have 
enabled  ourselves  to  define  ALL  of  the  necessary  predicate  frames  for  any  domain.  This 
representation  can  accommodate  some  of  the  information  needed  in  all  phases  of  the 
understanding  process  (both  syntactic  and  semantic  phases).  The  lexicon  of  predicates 
can  be  defined  in  terms  of  these  structures  and  this  structure  could  be  filled  in  during 
parsing  and  then  shipped  off  to  the  application  component  for  further  processing. 

So,  as  an  improvement  over  LSP  Information  Formatting  approach,  we  have  pro¬ 
vided  a  datastructure  that  can  be  used  during  all  phases  of  processing  and  that  demands 


all  predicate-related  information  be  defined  in  one  place.  Collecting  this  information  in 
one  place  seems  like  a  anti-climatic  benefit,  but  it  is  one  of  the  most  profound  benefits  to 
be  gained  in  this  representation.  Serious  problems  of  keeping  up  with  all  this  informa¬ 
tion  in  LSP  have  caused  the  integrity  of  the  information  contained  in  the  LSP  system  to 
be  compromised.  In  LSP,  predicate  type  information  can  be  found  in  at  least  the  lexi¬ 
con,  restrictions,  lists,  and  computed  attribute  procedures  thus  making  it  more  difficult 
to  maintain  and  modify. 

While  this  representation  may  not  be  adequate  as  an  ultimate  representation  for 
every  application,  it  will  at  least  serve  as  a  good  intermediate  representation.  It  is 
powerful  in  that  everything  can  be  represented  and  it  is  versatile  in  that  it  is  not  domain 
dependent  in  any  way.  In  the  following  section,  we  will  discuss  how  this  approach  could 
be  combined  with  a  theory  of  linguistics  from  which  even  greater  potential  benefits  might 
be  derived. 

4.2.  Applying  Case  Frame  Theory 

4.2.1.  Role  Abstractions  and  Case 

Linguists  have  attempted  to  isolate  a  set  of  abstract  roles  from  which  ALL  predi¬ 
cates  can  be  defined.  They  call  this  level  of  description  the  deep  structure.  From  a  deep 
structure,  linguists  hoped  to  be  able  to  describe  a  set  of  transformations  that  could 
account  (at  least)  for  variations  in  where  certain  arguments  can  appear  in  surface  struc¬ 
tures.  The  set  of  abstract  roles  that  can  be  found  for  all  predicates  in  the  world  should 
be  language  independent  and  only  their  manifestation  in  a  particular  language  would 
differ.  Charles  Fillmore  is  credited  with  developing  this  theory  which  is  called  case 
frame  theory. 

Fillmore  presented  an  initial  set  of  role  relationships  called  cases  in  the  first  famous 
paper  on  the  topic7.  In  a  later  paper8,  he  revised  that  list.  Many  different  researchers 
including  Grimes9  ,  Chafe10  and  Schank11  have  proposed  different  sets  of  cases  as  those 
that  would  suffice  to  describe  deep  structure  and  also  a  set  of  transformations  on  those 
cases  which  explains  why  different  predicates  order  their  arguments  differently  in  their 
surface  syntactic  manifestation.  Unfortunately,  no  generally  accepted  set  has  ever  been 
developed. 

4.2.2.  The  Benefits  of  Case  Theory 

Because  the  case  frame  is  thought  to  be  deep  level  and  language  independent,  it 
seems  intuitive  that  the  relationship  between  predicates  and  their  cases  would  somehow 
encompass  some  level  of  semantic  information  about  the  predicate.  Therefore,  the  great 
hope  for  a  universal  theory  was  further  encouraged  because  this  syntactic  approach 
seems  to  somehow  explain  a  certain  level  of  semantics,  as  well.  Bridging  at  least  some  of 
the  gap  between  syntax  and  semantics  could  have  significant  ramifications  on  the  design 
of  natural  language  systems.  This  aspect  of  case  frame  theory  is  what  has  attracted  our 
attention  to  case  frames. 

By  "making  up"  our  own  roles,  we  have  not  precluded  the  possibility  of  classifying 
our  made-up  roles  in  terms  of  the  abstract  roles.  When  we  created  our  roles,  we  were 
not  generating  more  or  less  roles  than  a  predicate  has,  we  were  just  defining  the  interpre¬ 
tation  of  those  roles  in  an  application  specific  way.  So,  assuming  that  there  is  such  a 


thing  as  a  basic  set  of  cases  (roles)  and  given  enough  information  about  how  to  identify 
those  cases,  we  should  be  able  to  superimpose  the  abstract  cases  on  our  application- 
specific  cases. 

Given  that  we  can  perform  this  superimposition,  we  can  exploit  the  current  state  of 
research  in  case  theory  and  define  some  syntactic  transformations  which  can  account  for 
some  of  the  syntactic  variability  in  surface  structures.  The  most  important  aspect  of 
using  case  theory  is  that  we  can  associate  these  transformational  rules  with  particular 
case  frames  in  a  very  modular  and  organized  manner.  Specific  transformational  rules 
can  be  included  in  the  definition  of  the  predicate’s  frame.  Therefore,  once  we  have 
identified  the  predicating  element  in  an  syntactic  unit,  we  can  quit  trying  all  the  possible 
permutations  of  its  arguments  and  only  try  to  parse  its  arguments  through  the  transfor¬ 
mations  that  are  part  of  its  definition.  At  the  same  time  we  are  trying  to  discover  the 
syntactic  structure,  we  can  use  semantic  type  checking  on  the  candidate  arguments  in 
order  to  rule  out  bad  analyses  on  semantic  grounds. 

4.3.  The  Prototype  Knowledge  Representation  System 

The  following  section  includes  some  samples  of  what  predicates  exist  in  the  starting 
air  compressor  CASREP  domain.  These  frames  were  derived  by  hand  but,  certainly  an 
attempt  to  be  methodical  was  made.  We  believe  that  these  frames  would  be  an  improve¬ 
ment  over  information  formats  for  reasons  previously  discussed.  This  set  is  not  to  be 
considered  complete  or  absolute.  The  process  of  deriving  a  good  set  of  frames  is  a 
difficult  one  and  depends  greatly  on  the  goals  of  the  application.  These  frames  were 
developed  with  no  particular  application  in  mind,  just  with  an  idea  of  what  the  major 
types  of  information  contained  in  these  messages  are.  Several  examples  from  our  test 
data  will  be  presented  with  each  predicate  class  to  give  the  reader  an  idea  of  what  predi¬ 
cates  would  be  members  of  the  class.  In  the  specification  of  the  frame,  the  first  line  is 
the  name  of  the  frame,  the  following  lines  are  the  slots  of  the  frame.  On  each  slot  line, 
the  first  element  is  the  role  name.  The  second  element,  printed  in  capital  letters,  is  the 
semantic  class  restrictions  on  arguments  filling  that  slot.  Some  semantic  class  restrictions 
are  noun  classes  but  others  will  be  frame  names  when  an  predicate  can  serve  as  an  argu¬ 
ment  to  another  predicate.  Since  we  have  not  developed  this  representation  for  a  partic¬ 
ular  application,  no  knowledge  sources  are  provided  to  show  how  to  interpret  the  roles 
assigned  to  particular  predicates.  We  did  do  some  guessing  based  on  our  experience  with 
the  previous  TERSE  system  application  and  this  is  reflected  in  our  choice  of  slot  names 
in  the  following  examples. 


BEHAVIOR-FUNC 

mechanical-  processor 

EQUIP 

human-operator 

ORG 

examples  of  BEHAVIOR- FUNC  predicates 

a4.1.3  engine  jacks  over 
bl4.1.2  starting  air  compressor  engaged 

bl6.1.4  lube  oU  pump  seized 

b!2.1.1  engagement  of  sac 


DAMAGE 


damager  EQUIP 
damagee  EQUIP 


examples  of  DAMAGE  predicates 

bl7.1.1  gear  homing  cracked 
bl6.1.5  driven  gear  was  sheared 

a24.1.I  wiped  bearings 


ATTRIBUTIVE 

described  *any-noun-or-predicate* 


examples  of  ATTRIBUTTVE  predicates 

b7.1.U  the  drive  shaft  was  dry 

a25.1.3  faulty  high  speed  rotating  assembly 

a31.1.1b  normal  sac  lo  pressure 


BE-LOC 

located 

(PIECE 

MATERIAL  PHYSICAL-FEATURE) 

location 

(EQUIP 

SUBST) 

orientation 

T7 

examples  of  BE-LOC  predicates 

a5.1.3  metal  particles  (be)  in  strainer 
bl7.1.2  large  crack  (be)  in  gear  housing  on  aft  end 


POSSESS 

possessor  (EQUIP  SUBST) 

_ possessed  (PHYSICAL-FEATURE  EQUIP  PROPERTY  CAPABILITY  ATTRIBUTIVE) 


examples  of  POSSESS  predicates 


srep30  the  drive  shaft  has  a  drilled  passage 

bl-t.1.4  lube  oil  has  burnt  odor 

srepl8  s/f  does  not  have  capability  to  t/s  further 


CAPABILITY 


do-er 

goal 


(EQUIP  ORG) 

*any-goal-oriented-predicate-frame 


examples  of  CAPABILITY  predicates 
srepl8  capability  to  t/s 

al.1.2  unable  to  consistently  start  nr  lb  gas  turbine 
b7.1.1  local  monitoring  capability 


FAIL 

processor 

(EQUIP  ORG  PROPERTY) 

goal 

*any-goal-oriented-predicate-frame* 

examples  of  FAIL  predicates 

al.1.1  starting  air  regulating  valve  failed 

bl3.1.1  oil  pressure  failed 

srep8  sac  fails  to  maintain  lube  oil  pressure 


MAINTAIN 

agent 

(ORG  EQUIP) 

object 

PROPERTY 

recipient 

EQUIP 

examples  of  MAINTAIN  predicates 
srep3  (unable  to)  maintain  minimum  oH  pressure 


examples  of  CHANGE--  QUANTITY  predicates 
lo  pressure  was  dropping 


a9.1.2 

a21.1.1a  nr  4  •sac  oil  pressure  dropped  below  alarm  point  of  65  psig 

b7.1.9  lube  oil  pressure  could  not  be  adjusted  above  35  psig 
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